LLM 25-Day Course - Day 12: Introduction to Hugging Face Ecosystem

Day 12: Introduction to Hugging Face Ecosystem

Hugging Face is often called the GitHub of the AI/ML community. You can share models, datasets, and demos, and use the Transformers library to work with any model in just a few lines of code.

Hugging Face Ecosystem Components

Service	Role	URL
Hub (Models)	Repository for 500K+ models	huggingface.co/models
Hub (Datasets)	Repository for 100K+ datasets	huggingface.co/datasets
Spaces	Demo app hosting (Gradio, Streamlit)	huggingface.co/spaces
Transformers	Model loading/inference library	pip install transformers
Datasets	Dataset loading/processing library	pip install datasets
PEFT	Efficient fine-tuning (LoRA, etc.)	pip install peft
TRL	RLHF/DPO training library	pip install trl
Accelerate	Multi-GPU/TPU training	pip install accelerate

Account Creation and Token Setup

# Step 1: Create an account at https://huggingface.co
# Step 2: Generate a token at https://huggingface.co/settings/tokens
# Step 3: Login from the terminal

# Method 1: CLI login
# pip install huggingface_hub
# huggingface-cli login

# Method 2: Login from Python
from huggingface_hub import login

login(token="hf_YOUR_TOKEN_HERE")  # HF_TOKEN environment variable recommended

# Method 3: Environment variable (.env file)
# HF_TOKEN=hf_YOUR_TOKEN_HERE

Using Models with Transformers

# pip install transformers torch
from transformers import pipeline

# Sentiment analysis (done in one line!)
classifier = pipeline("sentiment-analysis")
result = classifier("I love learning about LLMs!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.9998}]

# Text generation
generator = pipeline("text-generation", model="gpt2")
output = generator("The future of AI is", max_new_tokens=30)
print(output[0]["generated_text"])

# Translation
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("How are you today?")
print(result)  # [{'translation_text': 'Comment allez-vous aujourd'hui ?'}]

# Question answering
qa = pipeline("question-answering")
result = qa(
    question="What is Hugging Face?",
    context="Hugging Face is a platform for sharing AI models and datasets.",
)
print(f"Answer: {result['answer']} (confidence: {result['score']:.2%})")

Datasets Library

# pip install datasets
from datasets import load_dataset

# Load a popular dataset (automatic download + caching)
dataset = load_dataset("squad", split="train[:100]")
print(f"Number of samples: {len(dataset)}")
print(f"Columns: {dataset.column_names}")
print(f"First example: {dataset[0]['question']}")

# Korean dataset
ko_dataset = load_dataset("kor_nlu", "sts", split="train[:50]")
print(f"\nKorean NLU data: {len(ko_dataset)} samples")

# Dataset preprocessing
def preprocess(example):
    example["question_length"] = len(example["question"])
    return example

processed = dataset.map(preprocess)
print(f"Average question length: {sum(processed['question_length']) / len(processed):.0f} chars")

Searching and Downloading Models from Hub

from huggingface_hub import HfApi, list_models

api = HfApi()

# Search for Korean models
models = list(api.list_models(
    search="korean",
    sort="downloads",
    direction=-1,
    limit=5,
))

print("Popular Korean-related models:")
for model in models:
    print(f"  {model.id} (downloads: {model.downloads:,})")

# Check specific model info
model_info = api.model_info("meta-llama/Meta-Llama-3.1-8B-Instruct")
print(f"\nModel: {model_info.id}")
print(f"Downloads: {model_info.downloads:,}")
print(f"Likes: {model_info.likes:,}")
print(f"Tags: {model_info.tags[:5]}")

Creating Demos with Hugging Face Spaces

Spaces is a service that hosts Gradio or Streamlit apps for free.

# pip install gradio
import gradio as gr
from transformers import pipeline

# Simple sentiment analysis demo
classifier = pipeline("sentiment-analysis")

def analyze_sentiment(text):
    result = classifier(text)[0]
    return f"{result['label']} (confidence: {result['score']:.2%})"

demo = gr.Interface(
    fn=analyze_sentiment,
    inputs=gr.Textbox(label="Text Input", placeholder="Enter a sentence to analyze"),
    outputs=gr.Textbox(label="Sentiment Analysis Result"),
    title="Sentiment Analysis Demo",
    description="Analyzes the sentiment (positive/negative) of text.",
)

demo.launch()
# Deploy to Spaces: huggingface-cli repo create then git push

Hugging Face is essential infrastructure for LLM development. From model downloads to fine-tuning and deployment, everything can be handled within this ecosystem. Starting next week, we will use these tools to begin hands-on projects.

Today’s Exercises

Create a Hugging Face account and obtain a token. Run GPT-2 using pipeline("text-generation") and compare the generation results for Korean and English inputs.
Find and load a Korean dataset using the datasets library, then print the data structure and the first 5 samples.
Create a simple text summarization demo with Gradio. You can use pipeline("summarization").